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Related Application: 

This application claims priority from Provisional application No. 60/158,714, 
filed on October 8, 1999. This application is also related to a Provisional 
application No. 60/158,713, also filed on October 8, 1999. 

Background of the Invention: 

In multi-user communication over linear, dispersive, and noisy channels, the 
received signal is composed of the sum of several transmitted signals corrupted by 
inter-symbol interference, inter-user interference, and noise. Examples include 
TDMA digital cellular systems with multiple transmit/receive antennas, wide-band 
asynchronous CDMA systems, where inter-user interference is also known as 
multiple access interference, wide-band transmission over digital subscriber lines 
(DSL) where inter-user interference takes the form of near-end and far-end 
crosstalk between adjacent twisted pairs, and in high-density digital magnetic 
recording where inter-user interference is due to interference from adjacent tracks. 

Multi-user detection techniques for multi-input multi-output (MIMO) systems 
have been shown to offer significant performance advantages over single user 
detection techniques that treat inter-user interference as additive colored noise 
and lumps its effects with thermal noise. Recently, it has been shown that the 
presence of inter-symbol interference in these MIMO systems could enhance 
overall system capacity, provided that effective multi-user detection techniques are 
employed. 

The optimum maximum likelihood sequence estimation (MLSE) receiver for 
MIMO channels was developed by S. Verdu, "Minimum Probability of Error for 
Asynchronous Gaussian Multiple Access Channels," IEEE Transactions on 
Information Theory, January 1986, pp. 85-96. However, its exponential complexity 
increases with the number of users, and channel memory makes its 
implementation costly for multi-user detection on severe-inter-symbol interference 
channels. 
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Two alternative transceiver structures have been recently proposed for 
MIMO dispersive channels as well. These structures, which are widely used in 
practice for single-input single-output dispersive channels, are the Discrete 
Multitone and minimum-mean-square-error decision feedback equalizer (MMSE- 
5 DFE). In the latter category, this includes A. Duel-Hallen "Equalizers for Multiple 
Input Multiple Output Channels and PAN Systems with Cyclostationary Input 
Sequences," IEEE Journal on Selected Areas on Communications, April 1992, pp. 
630-639; A. Duel-Hallen "A Family of Multiuser Decision-Feedback Detectors for 
Asynchronous Code Division Multiple Access Channels," IEEE Transactions on 
10 Communications, Feb/Mar/April 1995, pp. 421-434; J. Yang et an "Joint 

Transmitter and Receiver Optimization for Multiple Input Multiple Output Systems 
with Decision Feedback," IEEE Transactions on Information Theory, Sep. 1994, 
O pp. 1334-1347; and J. Yang et al "On Joint Transmitter and Receiver Optimization 
5 for Multiple Input Multiple Output (MIMO) Transmission Systems," IEEE 

15 Transactions on Communications, Dec. 1994, pp. 3221-3231. Alas, the prior art 

Si does not offer a practical MIMO MMSE-DFE receiver with feedforward and 

;hj 

5 feedback FIR filters whose coefficients can be computed in a single computation 

;L (i.e., non-iteratively) in real-time under various MIMO detection scenarios. 

20 Symmary 

G An advance in the art is realized with receiver having a multiple number of 

£3 

receiving antennas that feed a MIMO feedforward filter that is constructed from 
FIR filters with coefficients that are computed based on environment parameters 
that are designer-specified. Signals that are derived from a multiple-output 
25 feedback filter structure are subtracted from the signals from the multiple outputs 
of the feedforward filter structure, and the resulting difference signals are applied 
to a decision circuit. Given a transmission channel that is modeled as a set of FIR 
filters with memory v , a matrix W is computed for a feedforward filter that results 
in an effective transmission channel B with memory N bf where N b , where B is 

30 optimized so that B op/ = argmin s trace(R ee ) subject to selected constraints; R ee 

being the error autocorrelation function. The feedback filter is modeled by 
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[in, ®n,xn,N b ] » where n\ is the number of outputs in the feedforward filter, as 

well as the number of outputs in the feedback filter. 

The coefficients of feedforward and the feedback filters, which are sensitive 
to a variety of constraints that can be specified by the designer, are computed by a 
5 processor in a non-iterative manner, only as often as it is expected for the channel 
characteristics to change. 

Brief Description off the Drawing 

FIG. 1 shows the major elements of a receiver in accord with the principles 
10 disclosed herein; 

FIG. 2 presents the structure of elements 23 and 26; 
FIG. 3 is a flowchart describing one method carried out by processor 22; 

and 

FIG. 4 is a flowchart describing another method carried out by processor 
M 15 22. 
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Detailed Description 
j FIG. 1 depicts the general case of an arrangement with n\ transmitting 

\ antennas 11-1, 11-2, ... 11 -n q , that output signals (e.g., space-time encoded 
! 20 signals) to a transmission channel, and n 0 receiving antennas 21-1, 21-2, ... 21 -n 0 . 
Each transmitting antenna p outputs a complex-valued signal x p , the signals of the 
n q antennas pass through a noisy transmission channel, and the n 0 receiving 
antennas capture the signals that passed through the transmission channel. The 
received signals are oversampled by a factor of / in element 20 and applied to 
25 feedforward W filters 23. Thus, the sampling clock at the output of element 20 is 
of period T s =77/, where lis the inter-symbol period at the transmitting antennas. 
The transmission channel's characterization is also referenced to T s . 

Filter bank 23 delivers an n x plurality of output signals {n\ can equal n q for 
example) from which feedback signals are subtracted in circuit 24 and applied to 
30 decision circuits 25 (comprising conventional slicers). The outputs of decision 
circuits 25 are applied to feedback filters 26, which develop the feedback signals. 
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Processor 22 develops the filter coefficients for the filters within elements 23 and 
26 and installs the coefficients in the filters within these elements, as disclosed in 
detail below. 

In the illustrative embodiment disclosed herein, the received signal is 
expressed by 

/=1 m=0 

where y[ y) is the signal at time k at the / h receiving antenna, h { ^ j) is the m th 
coefficient (tap) in the channel impulse response between the / h transmitting 
antenna and the / h receiving antenna, and n (y) is the noise vector at the / h 
receiving antenna. The memory of this path (i.e., the largest value of m for which 
is not zero) is denoted by v ( '* y) . 
It may be noted that it not unreasonable to assume, that the memory of the 
transmission channel is the same for all ij links (n, xn 0 such links), in which case 

v (M) = v . Alternatively, the v (/,y) limit in equation (1) can be set to that v which 
corresponds to maximum length of all of the n i xn 0 channel input responses, 

i.e.,v =max ij v {i ' J) . It may also be noted that all of the variables in equation (1) are 

actually /x1 column vectors, corresponding to the / time samples per symbol in 
the oversampled FIG. 1 arrangement. 

By grouping the received samples from all n 0 antennas at symbol time k 
into an n 0 /x1 column vector y^, one can relate to the corresponding n, x1 
(column) vector of input samples as follows 

V 

y* =Z H m X *-m+1*. (2) 

m=0 

where H m is the MIMO channel coefficients matrix of size n 0 / ✓</?,. , x fr _ m is a size 
A7, x1 input (column) vector, and n k is a size n 0 /x1 vector. 

Over a block of N f symbol periods, equation (2) can be expressed in matrix 
notation as follows: 
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(3) 



or, more compactly, 

Vk+N f -X.k = ^^k+N f -\.k~v + ^k+N f -tk- (4) 

The subscripts in equation (4) indicate a range. For example k + N f -^:k means 
the range from /c + N f -1 to k , inclusively. 

It is useful to define the following correlation matrices: 

^xy = EiXk+Nf-n-vYk+Nf-lik] ~ ^xx"^ (5) 

Ryy = ^[y fc+ /V,-1: k yI + /V,-u] = HR XX H* + R„ n , (6) 

^xx - ^[ X k+Nf-r.k-v X k+N f -tk~v] 3nC ' (^) 

^nn = ^[^+/v r -1:/f^+W A -u] > (8) 

and it is assumed that these correlation matrices do not change significantly in 
time or, at least, do not change significantly over a time interval that corresponds 
to a TDMA burst (assumed to be much shorter than the channel coherence time), 
which is much longer than the length of the FIR filters in element 23 (in symbol 
periods denoted by Nf). Accordingly, a re-computation within processor 22 of the 
above matrices, and the other parameters disclosed herein, leading to the 
computation of the various filter coefficients, need not take place more often than 
once every TDMA burst. Once H, Rxx and R nn are ascertained (through the 
conventional use of training sequences), Rxy and Ryy are computed by RJHf and 

HR XX H* +R nn , respectively. 

In accordance with the principles disclosed herein, element 23 comprises a 
collection of FIR filters that are interconnected as shown in FIG. 2, and the impulse 
response coefficients of element 23 can be expressed by W* = [W 0 * W,* ] , 

each having N f matrix taps W,., of size (/ 0 nxn,.). That is, W, has the form: 
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W = 



w 



w 



<V*> 



0) 



where each entry in Wj p,q) is an /x1 vector corresponding to the / output samples 
per symbol. Stated in other words, the matrix W 0 specifies the 0 th tap of the set of 
filters within element 23, the matrix W, specifies the 1 st tap of the set of filters 
within element 23, etc. 

Also in accordance with the principles disclosed herein, element 26 
comprises a collection of FIR filters that also are interconnected as shown in 
FIG. 2, and the impulse response coefficients of element 26 is chosen to be equal 
to 

K o n/X ^]-B^[(B^ B ;) b; ... b;j, do 

where B* is expressed by B* = [Bq B* • --B^ ] , with (N b + 1) matrix taps B,. , each of 

size n ; .xn ( . . That is, B,. has the form: 

"50.1) ... 5(1."/ > ~ 



(11) 



... b- n " ni) 

Stated in other words, B 0 specifies the 0 th tap of the set of filters within element 
26, the matrix B t specifies the 1 st tap of the set of filters within element 26, etc. 

Defining B* = [0„ /XMb B*] , where B* is a matrix of size n t xn,.(/V f +v) , the 
value of A/ b is related to the decision delay by the equality (A + A/ 6 + 1) = (/V, +v). 

The error vector at time k is given by 

Therefore, the n l xn l error auto-correlation matrix is 



6 



8:41 AM 9/24/00 



Al-Dhahir 2 



R ee =E[E k E k ] 

= B'R XX B - B'RJN - W'RJB + \N'R W W 

y y* yy - * 

= B'(R XX - R^R^JB + (W* - B*R xy R^ )R„ (W - B*R xy R^ ) 
= B'R- L B + G*R yy G 

Using the Orthogonality Principle, which states that E\E k y' k+N n ] = 0 , it can 
be shown that the optimum matrix feedforward and feedback filters are related by 

w^b^r-; 

=b; p ,r xx h*(hr,x+RJ- 1 (14) 

=B; pf (R;:+H*R-:H)- i H-R-:, 

and the n i xn,. auto-correlation matrix R ee is 

R ee ^E[E k E[] 

= B*(R JOf -R jy R^R >af )B 

= 8^8 (15) 
= B'(R„ - R^fHR^ -hR n J" 1 HR xx )B 
= B*(R xx +HXnH) _1 B. 

R ee can also be expressed as R ee = B*R~ 1 B , where R = R~] + H*R" n 1 n H . 

It remains to optimize values for the B matrix and the W matrix such that, in 
response to specified conditions, the trace (or determinant) of R ee is minimized. 
The following discloses three approaches to such optimization. 
Scenario 1 

In this scenario, it is chosen to process only previous receiver decisions. 
These decisions relate to different users that concurrently have transmitted 
information that has been captured by antennas 21 and detected in circuit 25. 
That means that feedback element 26 uses only delayed information and that the 
0 th order coefficients of the filters within element 26 have the value 0. Therefore, 
in light of the definition expressed in equation (10), this scenario imposes the 
constraint of B 0 = I n/ . 

To determine the optimum matrix feedback filter coefficients under this 
constraint, the following optimization problem needs to be solved: 
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# 



m/>7gR ee = min B B'R-% , subject to B'® = C* , 



(16) 



where 



0 



0 



and C = [0 W i n< ] 



(17) 



It can be shown that the solution to the above is given by 

B op( =R®(®*R<a>)- 1 C, 
resulting in the error signal 

Ree. mi n=C*(^R<8>)- 1 C 

If we define the partitioning 

R^ R 12 



R = 



r; 2 r 22 



(18) 
(19) 

(20) 



where R„ is of size /7,(A + 1)xn,(A + 1), then 



B 



Opt 



_ R 12 R 11. 



and 



(21) 



(22) 



R ee,min ~^ R 11^» 

where the delay parameter A is adjusted to minimize the trace (or determinant) of 

R ee.mi« ■ Once B Qpt is known, equation (14) is applied to develop W 0 * p , . 

FIG. 3 presents a flowchart for carrying out the method of determining the 
filter coefficients that processor 22 computes pursuant to scenario 1 . Step 100 
develops an estimate of the MIMO channel between the input points and the 
output point of the actual transmission channel. This is accomplished in a 
conventional manner through the use of training sequences. The estimate of the 
MIMO channel can be chosen to be limited to a given memory length, v , or can be 
allowed to include as much memory as necessary to reach a selected estimate 
error level. That, in turn, depends on the environment and is basically equal to the 
delay spread divided by 7V 
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Following step 100, step 110 determines the matrices, R n n, Rxx, Rxy, and 
Ryy. The matrix R nn is computed by first computing n = y-Hx and then computing 

the expected value E[nn] - see equation (8) above. The matrix Rxx is computed 
from the known training sequences - see equation (7) above - (or is pre- 
5 computed and installed in processor 22). In may be noted that for uncorrelated 
inputs, Rxx=l. The matrices Rxy and Ryy are computed from the known training 
sequences and the received signal or directly from H and R nn ~ see equations (5) 
and (6) above. 

Following step 110, step 120 computes R = R;] -HfR^H , and the partition 
10 components, Ru, R 12l and R22, as per equation (20). Following step 120, step 130 
computes R eetfnin from equation (22) and adjusts A to minimize the trace (or 

determinant) of R eetmin , computes B opf from equation (21), and from B opf 

determines the coefficients of the n i xn i filters of element 26, pursuant to equation 

(10). Step 140 computes W^ pf from equation (14), and finally, step 150 installs the 

coefficients developed in step 130 into the filters of element 26 and the coefficients 
developed in step 140 into the filters of element 23. 

A second approach for computing B opt utilizes the block Cholesky 
factorization (which is a technique that is well known in the art): 

■1 0YD, oifi; l' 2 l 
. 2 L 3 JL 0 D 2 J[o L* 3 J 

= LDL", 

20 where L, is of size n / (A + 1)xn / (A + 1). Using the result in equations (18) and (19) 
yields 



B op( = 





C = 




L L" 1 




_i 2 i_vc 



(23) 



- L [ e n,A op , "* e "/(A opf +1)-l] 



and 
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(24) 



R =C*D: 1 C 

ee,min 1 

= cf/ag(af- 1 A(v() -,o(- 1 (A(ijf+1H ), 
where the index A op , is chosen (as before) to minimize the trace and determinant 
of R eem/ „ . Using equation (23), equation (14) can be expressed as follows 

^p^B^R.H^HR^H+Rnn)- 1 

=b; p/ (r;; + h-r-:h)- 1 h'r-: (2 5) 



0 



i-1 



^/(A 0 ^+1)-i e n,(A o/) ,+1)-1 



l- 1 h*r:1 



Yet a third approach for computing B opt and R ee>min defines B* = [C* B*] 

R ll R 12 



and partitions R 1 into as 
yield 

R =B*R X B 



rv 12 rx 22 



where is of size n,(A + 1)xn,(A + 1), to 



.[C §■] 



R^ Rf 2 

R-1-* Dl 
.■^12 *^22 

R^ Rf 2 

Rl'Rl 

rv 12 rv 22 



"c 

B 

V 

B 



(26) 



= (R^ -Rf 2 (R 2 L 2 )- 1 Rf 2 *) + (B* +Rf 2 (R^ 2 )- 1 )R^(B* + Rf- 2 (R^)- 1 )\ 
where = C'R^C and Rf 2 = C'Rf 2 . Therefore, 

B op< =_R 12( R 22r 1 

Kt =P W h, -Ri 2 (R22r 1 ](RxxH*(HR xx H* + R nn )- 1 



(27) 



(28) 



Scenario 2 

In this scenario it is assumed that users whose signals are received by the 
FIG. 1 receiver are ordered so that lower-indexed users are detected first, and 
current decisions from lower-indexed users are used by higher-indexed users in 
making their decisions, i.e., B 0 is a lower-triangular matrix. The general results of 
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equations (21) and (22) can be applied by setting C* =[0 n/xn/A BJ,] where B 0 is an 

rifXrif monic lower-triangular matrix whose entries are optimized to minimize 
trace{R ee ,min)- To this end, a partitioning can be defined where 



Ri R 2 

r;r, 



(29) 



R11 being the term corresponding to R-n of equation (20), with Ri being of size 
/7,Ax/7,A , and R3 being of size n, xn, . Equation (22) simplifies to 



R e e. m in=B 0 R 3 Bo 



(30) 



It can be shown that, the optimum monic lower-triangular B 0 that minimizes 
frace(Ree.min) is given by the nomic lower-triangular Cholesky factor of R3 1 , i.e., 

(31) 



which yields 



and 



R; 1 =L 3 D3L* 3) 
Bf = L 3 , 



Re e . m in=D 3 _1 . 



(32) 



(33) 



The result is that B opf = 



Ri 2 R^ 1 



C, as expressed in equation (21), with the 



modified value of R n \ and with 

C*=[°w B;]. (34) 
A second approach for computing the optimum FIR filter coefficients for the 
FIG. 1 receiver involves computing a standard - rather than a block - Cholesky 
factorization of the matrix R = R' X 1 X + HfR^H fsee the definition following equation 
(15)) in the form LDL\ Then, the coefficients of the element 23 filters is given by 
the nj adjacent columns of L that correspond to a diagonal matrix with the smallest 
trace. Therefore, equations (23) and (25) are used to compute the corresponding 
coefficients, with the understanding that L is now a lower-triangular matrix, rather 
than a block lower-triangular matrix. The equivalence of the two approaches can 
be shown using the nesting property of Cholesky factorization. 
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FIG. 4 presents a flowchart for carrying out the method of determining the 
filter coefficients that processor 22 computes pursuant to scenario 2. Steps 100 
through 120 are the same as in FIG. 3, but the method diverges somewhat in the 
following steps. In step 131 the partition according to equation (20) is developed 
5 for a A that minimizes Ree.min of equation (33), and control passes to step 141, 
where B£ p ' is computed based on equations (31) and (32), followed by a 

computation of B opt based on equations (21) and (34). Following step 141, step 

151 computes W^ pf from equation (14), and finally, step 161 installs the 

coefficients developed in step 141 into the filters of element 26 and the coefficients 
10 developed in step 151 into the filters of element 23. 
Scenario 3 

When multistage detectors are employed, current decisions from all other 
users, obtained from a previous detection stage, are available to the user of 
interest. Therefore, suppressing their interfering effects would improve the 
jyi 15 performance of the receiver. This detection scenario has the same mathematical 
formulation as scenarios 1 and 2, except that B 0 is now constrained only to be 
monic, i.e., e,*B 0 e f . = 1 for all 0 < / < n i -1. The general results in equations (21) 

and (22) still apply with C* = [0„ xn A B*] where B 0 is optimized to minimize 

trace(R ee ,min)- In short, under scenario 3, the following optimization problem is 
20 solved: 

minfrace(BoR 3 B 0 ) subject to e^e,. = 1 for all 0 < / < n, -1 , (33) 

B 0 

where R3 is as defined in equation (29). Using Lagranage multiplier techniques, it 
can be shown that the optimum monic B 0 and the corresponding MMSE are given 
by 

25 B° Q pt = ^ e ^ 1 ; 0</<n,-1. (34) 

Thus, the method of determining the filter coefficients that processor 22 computes 
pursuant to scenario 3 is the same as the method depicted in FIG. 4, except that 
the computation of B° 0 pt within step 141 follows the dictates of equation (34). 
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With the above analysis in mind, a design of the filter coefficients of the 
filters within elements 23 and 26 can proceed for any given set of system 
parameters, which includes: 

o MIMO channel memory between the input points and the output point of the 

actual transmission channel, v , 
o The number of pre-filter taps chosen, /V f , 
o The shortened MIMO memory, A/ b , 
o The number of inputs to the transmission channel, n if 
o The number of output derived from the transmission channel, n 0 , 
o The autocorrelation matrix of the inputs, Rxx, 
o The autocorrelation matrix of the noise, R nn , 
o The oversampling used, /, and 
o The decision delay, A . 

It should be understood that a number of aspects of the above disclosure 
are merely illustrative, and that persons skilled in the art may make various 
modifications that, nevertheless, are within the spirit and scope of this invention. 
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